A Secure Protocol for Computing String Distance Metrics

نویسندگان

  • Pradeep Ravikumar
  • William W. Cohen
  • Stephen E. Fienberg
چکیده

An important problem is that of finding matching pairs of records from heterogeneous databases, while maintaining privacy of the database parties. As we have shown in earlier work, distance metrics are a useful tool for record-linkage in many domains, and thus secure computation of distance metrics is quite important for secure record-linkage. In this paper, we consider the computation of a number of distance metrics in a secure multiparty setting. Towards this goal, we propose a stochastic scalar product protocol that is provably consistent, and is also as secure as an underlying set-intersection cryptographic protocol. We then use our stochastic dot product protocol to perform secure computation of some standard distance metrics like TFIDF, SoftTFIDF and the Euclidean Distance Metric. Not only are they asymptotically consistent, but experiments show that the stochastic estimates are also quite close to the true values after just 1000 samples. These secure distance computations can then be used to perform secure matching of records.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Provably secure and efficient identity-based key agreement protocol for independent PKGs using ECC

Key agreement protocols are essential for secure communications in open and distributed environments. Recently, identity-based key agreement protocols have been increasingly researched because of the simplicity of public key management. The basic idea behind an identity-based cryptosystem is that a public key is the identity (an arbitrary string) of a user, and the corresponding private key is ...

متن کامل

Secure Routing Protocol: Affection on MANETs Performance

In mobile ad hoc networks, the absence ofinfrastructure and the consequent absence of authorizationfacilities impede the usual practice of establishing a practicalcriterion to distinguishing nodes as trusted and distrusted.Since all nodes in the MANETs would be used as router inmulti-hop applications, secure routing protocols have vital rulein the security of the network. So evaluating the perf...

متن کامل

Privacy-Preserving Protocols for of Edit Distance and Other Dynamic Programming Algorithms

The edit distance between two strings is the minimum number of delete, insert, and replace operations needed to convert one string into another. Computational biology tasks such as comparing genome sequences of two individuals rely heavily on the dynamic programming algorithm for computing edit distances as well as the algorithms for related string-alignment problems. A genome sequence may reve...

متن کامل

Efficient Privacy-Preserving General Edit Distance and Beyond

Edit distance is an important non-linear metric that has many applications ranging from matching patient genomes to text-based intrusion detection. Depends on the application, related string-comparison metrics, such as weighted edit distance, Needleman-Wunsch distance, longest common subsequences, and heaviest common subsequences, can usually fit better than the basic edit distance. When these ...

متن کامل

A Comparison of String Distance Metrics for Name-Matching Tasks

Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators , token-based distance metrics, and hybrid methods. Overall, the best-performing method is a hybrid s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004